Negative Conditional Entropy of Post- Selected States 
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Abstract. Using Dirac complex distribution, and hence the statistics of weak measurements, we discuss 
a decomposition of "conditional state" of post-selected systems and introduce an entropic measure of 
information for them. In doing so we remark on the role of pre- and post- selection in the measurement of 
an ensemble. Conditional states are the quantum analogues of the conditional probabilities. We define them 
by selecting a particular condition in the measurement of a quantum system and studying a coarse grained 
set of events in the history of the state that ended in that particular condition. These states are different 
from what is known as conditional states in the literature [Td1|5], in the sense that they are trace- 1 operators 
and, by construction, they can be measured using weak measurements. We shall then define a conditional 
entropic measure based on these states, which as opposed to their classical counterparts, can have negative 
values. This is also the case even in the case of single state systems. This negative conditional entropy 
quantifies the amount of information in the post-selected ensembles, states which signify a non-separable 
class of histories of a quantum system. 



1 Introduction 

Dirac in his work introduced a complex phase-space 
distribution to make an "Analogy Between Classical and 
Quantum Mechanics" [8] . This distribution, however, is 
not limited to phase space. In fact any two operators with 
non-vanishing overlap between each of their eigenstates 
can be used to provide such a distribution. This is due 
to the fact that one can describe any quantum state of 
d-dimensional Hilbert Space with d^ — 1 elements. Hence 
a space of any two such observables of a system would 
be sufficient to gain all information available in the state. 
Dirac distribution has been recently studied extensively 
from theoretical point of view [T31 [T^, and the experi- 
mental procedure for the direct measurement of that for 
a general quantum state has been givenjTT]. Hence we 
shall not review these concepts in depth in this paper. 
Our main results are Eqs. ^ and (12), where we utilise 
weak measurement statistics to give a decomposition of 
states which contain information about a single quantum 
system, relevant to a particular post-selection. Not only 
this decomposition signifies a particular pre- and post- 
selected ensemble, but also enables us to encompass a 
particular set of weak measurements to be done in the 
intermediate time. Then we use those states to define 
an entropy relevant to such information about histories. 
The remarkable fact about defining the conditional states 
of Eq. ^ is that they give a clear picture about the dif- 
ferences between conditional states in classical mechanics 
and their counterparts in quantum theory. Specifically, 



one can define an entropic measure, as we do in Eq. ( 12 ) 



that can have negative values, even in the case of single 
state systems. 
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2 Preliminaries 

The Dirac distribution of a system, with a choice of 
operators A and B is given by 



Pr(a„,6„) = Tv[pA,nB.n\ 



(1) 



where p is the density operator, a™ and 6„ are the eigen- 
values of the chosen operators A and B, and Am and 
En are the projectors onto the corresponding eigenstates. 
Any choice of operators, A and i?, will give a complete 
set of Dirac probabilities, as long as they have the same 
Hilbert space dimension as the state p to be described, 
and the eigenvectors of the two operators are mutually 
non-orthogonal and none of those eigenvectors are or- 
thogonal to the original state p. We notice that any quan- 
tum state can be represented in a Dirac decomposition 
as follows 



= y^Pr(a„i,6n) 



\am){bn\ 

{K\am) 



(2) 



The Dirac distribution satisfies all the conditions of 
a classical Kolmogorov probability distribution, except 
that it is not a positive, real function; it is normalised 
and gives correct marginals, 

^ Pr(a„„ fe„) = Tr[pA,„B„] = Tr[pA„] (3) 



and 



Pr(am, 6„) = ^ TY[pAmBn\ = Tr[pB„], 



(4) 



and obeys the sum rule and the product rule. Dirac 
distribution is also compatible with Bayes law. It was 
shown that the negativity and complexness of this func- 
tion is due to the non-commutativity of the quantum 
mechanical observables [T3] . Hence, it should not be 
surprising that the probability distribution underlying a 



theory based on non- commuting observables is generally 
complex and outside the [0, 1] interval. Furthermore, it 
has been recently shown by Morita et al. that to satisfy 
Gleason's theorem in explicitly time-symmetric models 
of quantum theory, without coarse graining, one has to 
adopt complex probability measures (For our comment 
on their results, see the appendix) p3]- This "fundamen- 
tal probability distribution" has recently been rediscov- 
ered by Hartle in the context of histories-based quantum 
theories ^10|. 

Hartle introduced the same distribution function to de- 
scribe alternative histories. In the context of histories- 
based models of quantum theory, an individual history 
a of a quantum state is characterised by a sequence of 
projectors at each time 



Ca — Pn{tn)---Pl{tri 



(5) 



Such chains are not generally projections themselves un- 
less all of the members of the chain commute. In Hartle's 
work, the extended probability for a history is introduced 
as 



Pr(a) =Re[(^|a| 



(6) 



This is indeed the real part of the Dirac distribution in 
Eq. ([T]) for a pure initial state, where the two projectors 
used to describe the quantum state are replaced by the 
chain of projectors describing the history CqQ In this 
context one may not be interested in the imaginary part. 
However, as was shown by Johansen, the imaginary part 
contains extra information that can be used to analyse 
the dynamics of the system [T3] . 

One may notice the relevance of the aforementioned 
probabilities with weak values [l]. Given a projection op- 
erator, H, into an arbitrary state, for an ensemble which 
is pre- and post-selected in and |</>) respectively, the 
weak values of the operator is given by 



n„ 



(0 1 v-) ■ 



(7) 



This can be interpreted simply as the conditional Dirac 
distribution of the state \ip), with the condition of being 
post-selected in the state \<j)). Defining := and 
Pcj, := such a conditional probability is calculated 

using Bayes law as 



H„ = Pr(H|0) 



Tr[p^P0] 



(8) 



Remembering that a history, in general, is described 
by a set of probabilities, Eq. ([s]) implies that pre- and 
post-selection in weak measurement is a method to sep- 
arately measure different amplitudes, or histories, that 



^Here, for simplicity and without loss of generality we study the 
case for pure states. However, it was shown by Halliwell that Con- 
sistent Histories can be extended for mixed initial states [9]. The 
general mixed state measurement procedure using Dirac distribu- 
tions was discussed by Lundeen and Bamber 17 . However, such 
extensions are not the concern here. 



contribute to a quantum state, as is illustrated in Fig- 
ure Q 1^ It has been discussed in the weak measure- 
ment literature that using such selections, one can ask 
counter-factual questions in quantum theory 2 . For in- 
stance, Eq. ([S]) asks what was the history of the system 
p^, ending in p^, revealed by weak measurements, with- 
out destroying the state of the system before the post- 
selection measurement. We shall clarify this point further 
in the next section, when we calculate the states encoding 
information about these histories. 

Nevertheless, to have an intuition about what these 
histories are, assume the 3-box problem [4 . Here a state 
is prepared, at time t = 0, in a superposition of three 
boxes A, B and C, e.g. = ^{\A) + \B) + \C)). 

At a later time t = 2, the system is measured to be 
in some other state which is not orthogonal to j?/'), e.g. 

= ^{\A) + \B) -\C)). Asking about the probability 
that the system was in box A at time t = 1 and ended 
in state |^) at time t — 2 is a. particular question of 
history. Similarly, one can ask this question about boxes 
B and C. We have calculated an explicit example of 
the 3-box problem at the end of the next section. As 
discussed by Hartle, in the presence of interference these 
probabilities can take negative values. In this case, to 
put it in the language of Bayesian probability theory, the 
histories represent instances where one cannot settle a 
bet with a single basic measurement [lOj . 



ABC 




Fi gUre li Different histories, distinguished by a unique post- 
selection in |(/)("')'s, and events A, B and C that can be counter- 
factually argued for, using weak measurements. 

3 Quantum conditional states and con- 
ditional Entropy 

The concept of "conditional state" is defined in differ- 
ent ways [121 HSl E] .when used in an equation of classi- 
cal conditional entropy, H{A\B) = — Tr[p^Bln(p5|^)], it 
gives the correct quantum conditional entropy. However, 
the trace of such conditional states is generally not equal 



^Note that this diagram is neither depicting what is normally 
referred to as "history" in the in the consistent histories literature, 
as the lines don't necessarily represent paths in spacetime, nor does 
it depict a history in the sense of the two-state formalism, as none 
of the axes represent time. Instead, this diagram intends to give 
a pictorial view of histories of a state by emphesizing that every 
post-selection singles out a different history to be summed to get 
the full picture of a quantum event. 



to one, hence they cannot be regarded as physical states; 
one cannot directly measure them. 

Conditional states as we shall define them in the fol- 
lowing are measurable physical states, akin to conditional 
states of classical mechanics. As pointed out in Section[2j 
post-selection is a method to distinguish one alternative 
history from the others. Events in each particular history 
are characterised by the weak values - the results of weak 
measurements of the events with a pre- and post-selection 
for that particular history. 

For instance in the figure ([l]) , assuming a particle that 
can be in three different boxes. A, B and C . A partic- 
ular choice of pre- and post-selection, take \^) and |0), 
determines a unique history and that history can be mea- 
sured using weak measurements [4 . In other words this 
unique history with respect to a particular choice of pre- 
and post- selection means a specific choice of and |0) 
gives a unique conditional probability in Eq. (|8]), with 
the condition "ending in state |0)" being selected. 

Using the concepts discussed above, we construct the 
density matrix corresponding to a conditional Dirac dis- 
tribution. 



Pr(history|0) 



history 



|history)(0| 
(01 history) 



(9) 



This is the density operator of all events with the con- 
dition that the state is post-selected in state |0). All 
possible final states {|0)} form a complete orthonormal 
basis set. Here | history) represents the quantum state 
of one possible history and the sum runs over all possi- 
ble histories. For instance, if the system could be in the 
three boxes, A, B and C, then |^) is one possible history 
state [history), the others are \B) and |C). The state for 
this specific example is calculated later in this section in 
Eq. ( [l8| . The states thus defined are trace-one operators 
and can indeed be measured as they are, by construction, 
determined by weak measurements. 

If we now multiply p^i^ by probability Pr(0) of the 
system ending in state |0) and sum Eq. ^ over all pos- 
sible 0's we retrieve the density operator of Eq. ^ which 
contains the full information about the system, 



10, 



(10) 



with Pr(0) = rr[p|0)(0|] being the probability of the 
state being in the state |0) at the time of post-selection. 
This is what is expected from the analogy with classical 
probabilities. 

One may note that the operator is not Hermi- 
tian. It has been argued in the weak measurement lit- 
erature why this should not be a cause of concern [T7]. 
We can now define a conditional entropy of post-selected 
systems in correspondence with classical conditional en- 
tropy. In classical information theory, the entropy of 
random variable X conditioned on selection of a par- 
ticular instance y of random variable Y is H{X\Y = 
y) = — X^a: Pr(a;|?/) lnPr(a;|y), where Pr(a;|?/) is the condi- 
tional probability of x conditioned on the occurrence of y. 



Then the classical conditional entropy H{X\Y) is given 
by H{X\Y) = 'EyP^iy)H{X\Y = y). Hence, similarly 
in quantum theory one can define an entropy of a state 
V' conditioned on post-selection in one out of a possible 
set of |0)'s as 

S,(V|$ = 0) = -\tv[p^\Mp^\4>p\\4,)\^ (11) 

where p^^^^ is the conjugate transpose of p^\^- We note 
that this particular form of the entropy is the average 
The argument of the logarithm 

gives the Singular values of the operator p^\^. The Sin- 
gular Value Decomposition is used as a generalisation 
of diagonalisation to calculate the logarithm of the den- 
sity operator, p^i^. Such a generalisation is required due 
to the form of the density operator in Eq. ([2]), defined 
with the eigenstates of the A and B operators. Indeed 
for Hermitian operators, such as p^ the singular values 
and the eigenvalues are the same. At this point it worth 
noting that, similar to other entropies previously defined 
for quantum systems, one can define other objects which 
serve as a type of entropy. However, the choice made here 
is the most natural one. For instance one could define 
the function ^^(^'1$ = 0) = -\'^^[p^\^p\\^HP^\^p''^\^)] 
as such an information measure. However, this could not 
correspond to the average of the logarithm. The choice 
we made is indeed an average of the logarithm of the 
density operator of the system, decomposed in the ap- 
propriate basis. Finally, a quantum conditional entropy 
can be defined as 



(12) 



This entropy, as can be verified in the following example 
later in this section, is a measure of information about a 
particular set of histories of a quantum mechanical state. 

To find the upper and lower bound of the new quantum 
conditional entropy we need to take into account that 
the conditional states are obtained by summing over all 
histories. Hence they are independent of a particular 
history. To see this explicitly we use relation Eq. ([s]) to 
obtain 



Pr(history|0) 



(0|history) (history|0) 



(01^) 

and inserting this into Eq. ^ yields 

jhistory) (history|-0) (0| 



(13) 



E 

history 

MM 

(01^) 



(14) 



which is previously known equivalent form of the state 
to describe a pre- and post- selected ensemble. Inserting 



Eq. (14 1 into Eq. (11) we obtain 



1 {<t'Mp^\^p\\^)\^) 

2 (01V) 
ln|(0|V)| 



(15) 



Similar to the fact that several different ensembles can 
be described using the same density operator, several dif- 
ferent histories can be described using the same condi- 
tional density operator. Remarkably, the path indepen- 
dence of conditional states holds for longer sequences of 
questions about the history of a state, such as the general 
one in Eq. ([s]). This is why, in general, one can calculate 
quantum mechanical probabilities using the path integral 
formalism, without the path being the real histories of a 
particle, moving in spacetime. 

Thus, even though we ask questions about the history 
using weak measurement and the conditional Dirac dis- 
tribution we obtain unique probabilities. The conditional 
state quantum is only sensitive to the specific choice of 
post-selection. Hence, we can exploit this form of the 
state, and apply it in Eq. (12), to calculate the entropy 
bounds as follows 



(16) 



Now we can see that Sc has a negative lower bound of 
g In ^ , where d is the Hilbert Space dimension of the 
system. This lower bound is reached when all 10) 's have 
the same overlap with ji/;), i.e. in the case of equidis- 
tribution of post-selected states. Sc is bounded from 
above by the von Neumann entropy of the system p itself 
(Eq. (U)). 

Example: 3-box problem |4] 

Assume there are three boxes. A, B and C, where a 
quantum particle can be found. If a system is prepared 
in the state IV-) = l/V^i\A) + \B) + \C)), if later the state 
is selected to be in 10) = l/\/3(|yl) -I- \B) — |C)), one can 
calculate the conditional probabilities using Eq. Q as 

Pr(A|0) - 1, 
Pr(B|0) = 1, 
Pr(C|0) = -1, (17) 

which is the same the results of the weak values of 3-box 
problem, preselected in the state and post-selected 
in the state \(f>). The conditional state of histories, with 
condition \(f>) being selected, defined in the Eq. ^ is 



E 

boxe{A,B,C} 



Pr{box\(j)) 



\box){^\ 
{(j>\box) 



(18) 



Once such a state is measured and/ or calculated, one 
can work out the conditional entropy of the system with 
selected state, 10), given in Eq. (Ill, to be Sc{tp\^ = 
0) — — In 3. Now to calculate the conditional entropy of 
Eq. (12), given that the Hilbert space dimension of the 



system is three, one needs to perform the same calcula- 
tion as above for two other post-selected states, |0') and 
10") to have sufficient number of Dirac probabilities to 
completely describe the state . The only restrictions of 
the choice of these two post-selected states are that they 
need to be mutually non-orthogonal to the state of each 
box, |A),|i?) and |C), and they need, together with our 



original state |0), to span the Hilbert space of the state 
1-0). Take these states to be 



-3 + \/3 
6 



\C) (19) 



and 



V3 



%/3, 



6 



B) 



V3 



6 



\C). 



(20) 



The same calculation of Eq. ( |Tl| ) as in the case of the 
state, with post-selection |0), repeated for |0') and |0") 
gives Sc{^\<l> = 0') = -log..i4.10 and S^.(i/;|$ 

1.10 
0.06 and Pr(0" 



- logs ' 

loggl.lO. With probabilities Pr(0) = 0.11, Pr(0')_ 



0.83, the conditional entropy Eq. (12) 



becomes Scii^l^) = —0.26, with the logarithm being cal- 
culated in base 3. 

As we see above, conditional entropies can have nega- 
tive values in the presence of quantum interference. This 
is what we expect. To clarify this concept, once again we 
emphasise that the post-selection gives a class of histo- 
ries, or amplitudes, where the particle ended in the state 
10). This is partial information about a "non-separable" 
part of a system. Hence producing negative partial en- 
tropy in the quantum system, akin to the negative con- 
ditional entropy in higher dimensions. Nevertheless, this 
is just measuring a particular part of the amplitude of 
a single system as opposed to one separate particle in a 
non-separable bipartite system. 

Conclusion In this work we have calculated condi- 
tional states as operators in two equivalent forms. One in 
a Dirac distribution decomposition Eq. ^ and the more 
condensed form of Eq. (14). The choice of Dirac de- 



composition serves as a unified formalism to infer state- 
ments about particular retrodictive reasoning of quan- 
tum events, whereas the latter form is used in the case of 
calculations which only concerns about the results of the 
pre- and post- selection, regardless of particular histories. 
We interpreted them as states summarising information 
about a distinct set of histories of a post-selected quan- 
tum states. These states, as opposed to the conditional 
states in [SJ [TB] are trace-1 operators and are the quan- 
tum analogue of classical conditional probabilities in the 
sense that one particular condition is selected and the 
probability of other relevant events are studied. However, 
we also showed that these states have properties beyond 
their classical counterparts due to non-commutativity in 
quantum mechanics. Most notably, conditional quantum 
entropies as defined in this work can have negative val- 
ues, similar to non-separable states in multi-partite sys- 
tems. Previously, negative entropies were not defined for 
the case of single particles. We shall discuss the single- 
particle non-locality of these states in a forthcoming cor- 
respondence. As a final point of the discussion we wish 
to mention that, similar to other entropies previously de- 
fined for quantum systems, one can define other entropic 
measures for post-selected ensembles. However, as we 
discussed in the paper, ours is the most natural choice 
among others. 
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A Appendix: On the Interpretation of 
Complex Probabilities 

In this work we have shown that complex probabili- 
ties, emerged as a consequence of non-commutativity of 
quantum mechanics, can be used as a calculational tool to 
analyse the results of histories and their measurements. 
However, we left the question of intelligibility of these 
extended probabilities open, as this is not intended to be 



the focus of this research. Nonetheless, a point regard- 
ing interpretation of them can be helpful in discovering 
the fundamental role they play as an underlying logical 
structure of the nature. 

Several interpretations of probabilities has been sug- 
gested so far [TTj. The Frequentist and the Bayesian in- 
terpretation are among the most received ones, with the 
Frequentist interpretation taking the lead by far in terms 
of its popularity. The reason for such a wide use is its 
success in offering a logical structure that can be used to 
reason about the classical world as we know it. Never- 
theless, that does not mean this interpretation is free of 
pitfalls, even in the realm of classical mechanics. Laplace, 
a champion of these classical probabilities described [M] 
probabilities as 

"The theory of chance consists in reduc- 
ing all the events of the same kind to a cer- 
tain number of cases equally possible, that 
is to say, to such as we may be equally un- 
decided about in regard to their existence, 
and in determining the number of cases favor- 
able to the event whose probability is sought. 
The ratio of this number to that of all the 
cases possible is the measure of this proba- 
bility, which is thus simply a fraction whose 
numerator is the number of favorable cases 
and whose denominator is the number of all 
the cases possible." 

This interpretation, being the classical one from which 
the Frequentist interpretation descends, has apparent 
fundamental issues. To determine such probabilities, one 
needs to make measurement of infinite sequence of events. 
If the probabilities are defined to describe the physical 
world, then one cannot try a random event infinite num- 
ber of times. Hence that probability can never be de- 
termines. This cannot be modified by simply limiting 
the number of trials to just a large number. This is due 
to the fact that this method gives different probabilities 
for different sequence of trials. Alternatively, de Finetti 
suggested that probabilities are strategies for making fair 
bets [7]. It has been argued why, in classical mechanics, 
de Finetti's interpretation of probabilities are real posi- 
tive values [TU]. The reason is then the bookie, to adopt 
the language of subjective probability, can offer a 'Dutch 
Book' by some choice of stakes. However, this problem 
does not occur in the case of quantum complex proba- 
bilities, since these probabilities emerge in the presence 
of interference. Hence the non-commutativity of oper- 
ators does not allow the bookie to offer a Dutch Book, 
as the bet cannot be settled by a single basic measure- 
ment on one particle. Thus, there is no guarantee on 
the result of the bet, each time the measurement is per- 
formed. Even in the case of weak measurement, the mea- 
surement result from a single particle has an uncertainty 
greater than what is required to settle a bet. There- 
fore these probabilities are compatible with the Bayesian 
interpretation. Moreover, relaxing mathematical axioms, 
to come up with self-consistent mathematical frameworks 



that best describe the physical phenomena, is not unfa- 
mihar to the contemporary science. The same way it 
is understood that the underlying geometry of nature in 
presence of gravity is non-Euclidean, one should not be 
reluctant to adopt non-Kolmogorovean probability the- 
ory as an underlying mathematical structure of nature 
at the quantum level. 

On the other hand, some care must be taken on the 
extent to which we interpret these probabilities. As we 
have shown in this paper, these functions can be used to 
describe the dynamics of a quantum systems in alterna- 
tive histories. The histories that end in a particular state, 
distinguished by post-selection. However, states are not 
often naturally post-selected in a state, i.e. quantum 
states do not have one single deterministic history. They 
are composed of several alternative histories, each end- 
ing in a different state undetermined in advance, over 
which we sum, proportional to the probability of each 
post-selection, to describe the complete quantum states. 
Hence, in interpreting complex probabilities, one has to 
bear in mind that this is a consequence of having joint dis- 
tribution of two interfering states, or alternative histories 
in one single state. Thus, in the two-state formulations of 
quantum mechanics [5], and two-state probability mea- 
sures [18j . while it is correct that quantum mechanics, 
on its own, is time-symmetric and the evolution back- 
wards in time is described by the dual states to the ones 
evolving forwards in time. Nonetheless, regardless of the 
future of a state, one can always post-select according 
to a different alternative history, and consequently get a 
different probabilities and different measurement results, 
because they describe different physical situations. For 
one particular instance the "Fate of the Universe" [3], 
need not be post-selected in a particular state and our 
cosmological observations need not be a reflection of such 
post-selection. Hence, we conclude that the two-state for- 
malism and pre- and post- selection are methods to put 
boundary conditions on a set of histories to single them 
out, as opposed to giving description of the whole system 
and its histories by only one choice of post-selection. 



